Skip to content

fix: detect ScyllaDB via SUPPORTED protocol extensions, not shard count#910

Open
mykaul wants to merge 1 commit into
scylladb:masterfrom
mykaul:fix-scylla-detection-without-sharding
Open

fix: detect ScyllaDB via SUPPORTED protocol extensions, not shard count#910
mykaul wants to merge 1 commit into
scylladb:masterfrom
mykaul:fix-scylla-detection-without-sharding

Conversation

@mykaul

@mykaul mykaul commented Jun 16, 2026

Copy link
Copy Markdown

Summary

The driver previously identified ScyllaDB solely by the presence of shard-related fields (SCYLLA_NR_SHARDS, SCYLLA_SHARD, etc.) in the SUPPORTED response.
When shard-awareness is disabled on the server side (allow_shard_aware_drivers: false), those fields are absent, causing the driver to misidentify a ScyllaDB cluster as Cassandra.
The immediate consequence is that the driver leaves peers_v2 enabled (which ScyllaDB does not support) and fails to connect — the same regression described in scylladb/gocql#902.

This fix mirrors scylladb/gocql#903.

Changes

cassandra/protocol_features.py

  • Add ProtocolFeatures.is_scylla bool (default False), set by the new detect_scylla() static method.
  • detect_scylla() returns True when any known Scylla-specific extension key (SCYLLA_LWT_ADD_METADATA_MARK, SCYLLA_RATE_LIMIT_ERROR, TABLETS_ROUTING_V1) is present in the SUPPORTED response, or when sharding_info is populated.
    These keys are advertised by ScyllaDB regardless of whether shard-awareness is enabled, so is_scylla stays True even with sharding disabled.
  • Fix latent crash in parse_sharding_info: int(shard_id) raised TypeError when SCYLLA_PARTITIONER/SCYLLA_SHARDING_ALGORITHM was present without SCYLLA_SHARD. Default to 0.

cassandra/shard_info.py / cassandra/c_shard_info.pyx

  • _ShardingInfo.__init__ / ShardingInfo.__init__: guard int(shards_count) and int(sharding_ignore_msb) against None (default to 0), matching the existing pattern used for shard_aware_port. Fixes the remaining crash path where a shard-defining field was present without SCYLLA_NR_SHARDS.

cassandra/cluster.py

  • _uses_peers_v2 disable and _metadata_request_timeout now gate on is_scylla instead of sharding_info is not None, so both work correctly when sharding is disabled.

cassandra/metadata.py

  • _is_not_scylla() now uses not is_scylla instead of shard_id is None.
    The old check was always False after the OPTIONS exchange (Cassandra gets shard_id=0, and 0 is None is False), meaning trigger-metadata queries were silently skipped even for Cassandra/DSE connections.

tests/unit/test_protocol_features.py

  • 7 new unit tests covering: LWT-only / rate-limit-only / tablets-only detection, full-sharding detection, pure-Cassandra non-detection, and two crash-regression tests for the int(None) paths.

Shard-aware pooling is unaffected

pool.py continues to gate per-shard connection opening on sharding_info is not None.
A ScyllaDB node with sharding disabled correctly gets is_scylla=True (so peers_v2 is disabled and USING TIMEOUT works) but sharding_info=None (so no per-shard connections are opened). This matches the gocql#903 fix's isScyllaConn() && nrShards != 0 guard.

Note on Cassandra behavioral change

_is_not_scylla() was previously always returning False post-OPTIONS for Cassandra (shard_id=0, 0 is None = False), so system_schema.triggers queries were silently skipped even for Cassandra/DSE. With this fix, Cassandra/DSE connections
will correctly issue the trigger-metadata query and populate TableMetadata.triggers.

See also

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 05907220-9e38-4e26-ad96-91b4bd56af06

📥 Commits

Reviewing files that changed from the base of the PR and between bea45ac and b223706.

📒 Files selected for processing (6)
  • cassandra/c_shard_info.pyx
  • cassandra/cluster.py
  • cassandra/metadata.py
  • cassandra/protocol_features.py
  • cassandra/shard_info.py
  • tests/unit/test_protocol_features.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • cassandra/shard_info.py
  • cassandra/c_shard_info.pyx
  • cassandra/metadata.py
  • cassandra/cluster.py
  • cassandra/protocol_features.py
  • tests/unit/test_protocol_features.py

📝 Walkthrough

Walkthrough

ProtocolFeatures now stores an is_scylla flag derived from SUPPORTED data and handles missing sharding fields by defaulting them to 0. ControlConnection._try_connect() uses connection.features.is_scylla to choose peers_v2 and metadata timeout behavior, and SchemaParserV3._is_not_scylla() now returns not is_scylla. Unit tests cover detection via Scylla-specific extensions, sharding metadata, Cassandra-only payloads, and partial sharding advertisements.

Sequence Diagram(s)

sequenceDiagram
  participant ProtocolFeatures
  participant ControlConnection
  participant SchemaParserV3
  ProtocolFeatures->>ProtocolFeatures: parse_from_supported(supported)
  ProtocolFeatures->>ProtocolFeatures: detect_scylla(supported, sharding_info)
  ProtocolFeatures-->>ControlConnection: is_scylla
  ControlConnection->>ControlConnection: set _uses_peers_v2 and metadata timeout
  ControlConnection->>SchemaParserV3: _is_not_scylla(features)
  SchemaParserV3-->>ControlConnection: not is_scylla
Loading
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed It concisely states the main change: ScyllaDB detection now uses SUPPORTED extensions rather than shard count.
Description check ✅ Passed It clearly summarizes the bug, rationale, key code changes, and tests, though it omits some checklist items like docs/source updates and Fixes annotations.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@mykaul mykaul force-pushed the fix-scylla-detection-without-sharding branch from a64d05e to bea45ac Compare June 16, 2026 13:32
@mykaul mykaul requested a review from Copilot June 16, 2026 15:50

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes ScyllaDB detection by using Scylla-specific SUPPORTED protocol extension keys (and sharding info when present) instead of relying on shard-related fields alone, preventing misclassification as Cassandra when shard-awareness is disabled and avoiding peers_v2-related connection failures.

Changes:

  • Add ProtocolFeatures.is_scylla and implement Scylla detection via known Scylla-specific SUPPORTED extension keys.
  • Harden sharding parsing/initialization against missing shard fields to prevent int(None) crashes.
  • Gate peers_v2 usage, USING TIMEOUT, and trigger-metadata behavior on is_scylla instead of shard-awareness heuristics; add unit tests for detection/regressions.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cassandra/protocol_features.py Introduces is_scylla detection and adjusts sharding parsing to avoid crashes.
cassandra/shard_info.py Guards integer conversion for missing shard-related fields.
cassandra/c_shard_info.pyx Mirrors shard-info conversion guards in the Cython implementation.
cassandra/cluster.py Switches peers_v2 disabling and metadata timeout gating to is_scylla.
cassandra/metadata.py Updates Scylla-vs-not logic to use features.is_scylla.
tests/unit/test_protocol_features.py Adds unit tests for Scylla detection and crash regressions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +101 to +103
resolved_shard_id = int(shard_id) if shard_id is not None else 0
return resolved_shard_id, _ShardingInfo(shard_id, shards_count, partitioner, sharding_algorithm, sharding_ignore_msb,
shard_aware_port, shard_aware_port_ssl)
Comment on lines +91 to +95
# SCYLLA_PARTITIONER passes the sharding guard, so sharding_info is populated
# with zero defaults rather than crashing.
assert pf.sharding_info is not None
assert pf.sharding_info.shards_count == 0
assert pf.sharding_info.sharding_ignore_msb == 0
Comment on lines +106 to +108
assert pf.is_scylla is True
assert pf.sharding_info is not None
assert pf.sharding_info.shards_count == 0
@mykaul

mykaul commented Jun 26, 2026

Copy link
Copy Markdown
Author

@coderabbitai review

@mykaul mykaul marked this pull request as ready for review June 26, 2026 17:28
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/test_protocol_features.py (1)

79-108: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert the normalized pf.shard_id in these regressions.

These cases exercise the missing-SCYLLA_SHARD path, but they never check the new top-level normalization to 0. Adding assert pf.shard_id == 0 would pin the actual fix and catch a regression back to None.

Suggested assertions
     def test_scylla_without_sharding_no_crash(self):
         ...
         assert pf.is_scylla is True
+        assert pf.shard_id == 0
         assert pf.sharding_info is not None
         assert pf.sharding_info.shards_count == 0
         assert pf.sharding_info.sharding_ignore_msb == 0

     def test_scylla_sharding_algorithm_only_no_crash(self):
         ...
         assert pf.is_scylla is True
+        assert pf.shard_id == 0
         assert pf.sharding_info is not None
         assert pf.sharding_info.shards_count == 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_protocol_features.py` around lines 79 - 108, The regression
tests in test_scylla_without_sharding_no_crash and
test_scylla_sharding_algorithm_only_no_crash only verify sharding_info, but they
do not assert the new normalized top-level shard_id value. Update both
ProtocolFeatures.parse_from_supported regression cases to also check that
pf.shard_id is 0, so the tests pin the missing-SCYLLA_SHARD normalization
behavior and catch any return to None.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/test_protocol_features.py`:
- Around line 79-108: The regression tests in
test_scylla_without_sharding_no_crash and
test_scylla_sharding_algorithm_only_no_crash only verify sharding_info, but they
do not assert the new normalized top-level shard_id value. Update both
ProtocolFeatures.parse_from_supported regression cases to also check that
pf.shard_id is 0, so the tests pin the missing-SCYLLA_SHARD normalization
behavior and catch any return to None.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 6e195da2-f03f-4a94-a124-f7657f65a989

📥 Commits

Reviewing files that changed from the base of the PR and between 763af09 and bea45ac.

📒 Files selected for processing (6)
  • cassandra/c_shard_info.pyx
  • cassandra/cluster.py
  • cassandra/metadata.py
  • cassandra/protocol_features.py
  • cassandra/shard_info.py
  • tests/unit/test_protocol_features.py

The driver previously identified ScyllaDB solely by the presence of
shard-related fields (SCYLLA_NR_SHARDS, SCYLLA_SHARD, etc.) in the
SUPPORTED response.  When shard-awareness is disabled on the server
side (allow_shard_aware_drivers: false), those fields are absent,
causing the driver to misidentify a ScyllaDB cluster as Cassandra.

Changes:
- Add ProtocolFeatures.detect_scylla() and is_scylla flag, set from
  known Scylla-specific extension keys (LWT, rate-limit, tablets
  routing) or sharding_info presence
- Replace sharding_info check with is_scylla for peers_v2 and
  metadata_request_timeout in cluster.py and metadata.py
- Guard int() calls against None in _ShardingInfo construction
  (c_shard_info.pyx, shard_info.py)
- Add tests for is_scylla detection and regression tests for
  missing shard fields
- Assert pf.shard_id normalization in test regressions (response
  to CodeRabbit review)
@mykaul mykaul force-pushed the fix-scylla-detection-without-sharding branch from bea45ac to b223706 Compare June 29, 2026 20:21
Comment thread cassandra/metadata.py
Comment on lines +2581 to +2588
"""Check if NOT connected to ScyllaDB.

Uses the is_scylla flag from ProtocolFeatures, which is set from the
presence of Scylla-specific extension keys in the SUPPORTED response
(e.g. SCYLLA_LWT_ADD_METADATA_MARK, SCYLLA_RATE_LIMIT_ERROR) and
therefore remains True even when shard-awareness is disabled.
"""
return not getattr(getattr(self.connection, 'features', None), 'is_scylla', False)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProtocolFeatures always have is_scylla attribute after your changes, why do you need getattr here?

Also, is it possible for connection to not have features?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was mostly a copy-paste from the original code, IIRC. I'll look into it.
(I'm also unsure - what if self.connection is null?)

Comment on lines +98 to +103
# SCYLLA_SHARD may be absent even when other shard fields are present
# (e.g. the connection landed on shard 0 and the server omits the field).
# Default to 0 to avoid int(None) crash.
resolved_shard_id = int(shard_id) if shard_id is not None else 0
return resolved_shard_id, _ShardingInfo(shard_id, shards_count, partitioner, sharding_algorithm, sharding_ignore_msb,
shard_aware_port, shard_aware_port_ssl)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can SCYLLA_SHARD be absent, as this comment indicates?
protocol-extensions.md doesn't mention it being optional. Scylla code also seems to insert it unconditionally: https://github.com/scylladb/scylladb/blob/b9d508d4097a91181da582305900c9318a9fec50/transport/server.cc#L2046-L2058

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the whole issue that started this thread - if Scylla server-side disables sharding, it doesn't advertise it whatsoever.

@Lorak-mmk Lorak-mmk Jun 30, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not what I'm asking.

SCYLLA_SHARD may be absent even when other shard fields are present

Afaik disabling sharding on scylla-side will make it not send ANY sharding-related values in SUPPORTED, which makes this comment false. See the code fragment I linked - all sharding related values are guarded by _server._config.allow_shard_aware_drivers. I don't see a possible scenario where SCYLLA_SHARD is absent but e.g. SCYLLA_NR_SHARDS is present.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. OK.

Comment thread cassandra/shard_info.py
Comment on lines 22 to 29

def __init__(self, shard_id, shards_count, partitioner, sharding_algorithm, sharding_ignore_msb, shard_aware_port, shard_aware_port_ssl):
self.shards_count = int(shards_count)
self.shards_count = int(shards_count) if shards_count else 0
self.partitioner = partitioner
self.sharding_algorithm = sharding_algorithm
self.sharding_ignore_msb = int(sharding_ignore_msb)
self.sharding_ignore_msb = int(sharding_ignore_msb) if sharding_ignore_msb else 0
self.shard_aware_port = int(shard_aware_port) if shard_aware_port else None
self.shard_aware_port_ssl = int(shard_aware_port_ssl) if shard_aware_port_ssl else None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is possible for shards_count / sharding_ignore_msb when other values are present. Either sharding info is enabled, in which those values are present, or it is not, in which case we should not attempt to construct _ShardingInfo object. In both cases, those new guards are unnecessary.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you want me to change the code to check for everything works with or without _ShardingInfo? I thought it might be a slightly bigger change. I can look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants